notebooks/community/migration/UJ10 Custom Training Prebuilt Container SKLearn.ipynb

{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "id": "copyright" }, "outputs": [], "source": [ "# Copyright 2021 Google LLC\n", "#\n", "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# https://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License." ] }, { "cell_type": "markdown", "metadata": { "id": "title:migration,new" }, "source": [ "# Vertex SDK: Train and deploy an SKLearn model with pre-built containers (formerly hosted runtimes)\n" ] }, { "cell_type": "markdown", "metadata": { "id": "install_aip" }, "source": [ "## Installation\n", "\n", "Install the latest (preview) version of Vertex SDK.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "install_aip" }, "outputs": [], "source": [ "! pip3 install -U google-cloud-aiplatform --user" ] }, { "cell_type": "markdown", "metadata": { "id": "install_storage" }, "source": [ "Install the Google *cloud-storage* library as well.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "install_storage" }, "outputs": [], "source": [ "! pip3 install google-cloud-storage" ] }, { "cell_type": "markdown", "metadata": { "id": "restart" }, "source": [ "### Restart the Kernel\n", "\n", "Once you've installed the Vertex SDK and Google *cloud-storage*, you need to restart the notebook kernel so it can find the packages.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "restart" }, "outputs": [], "source": [ "import os\n", "\n", "if not os.getenv(\"AUTORUN\"):\n", " # Automatically restart kernel after installs\n", " import IPython\n", "\n", " app = IPython.Application.instance()\n", " app.kernel.do_shutdown(True)" ] }, { "cell_type": "markdown", "metadata": { "id": "before_you_begin" }, "source": [ "## Before you begin\n", "\n", "### GPU run-time\n", "\n", "*Make sure you're running this notebook in a GPU runtime if you have that option. In Colab, select* **Runtime > Change Runtime Type > GPU**\n", "\n", "### Set up your GCP project\n", "\n", "**The following steps are required, regardless of your notebook environment.**\n", "\n", "1. [Select or create a GCP project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.\n", "\n", "2. [Make sure that billing is enabled for your project.](https://cloud.google.com/billing/docs/how-to/modify-project)\n", "\n", "3. [Enable the Vertex APIs and Compute Engine APIs.](https://console.cloud.google.com/flows/enableapi?apiid=ml.googleapis.com,compute_component)\n", "\n", "4. [Google Cloud SDK](https://cloud.google.com/sdk) is already installed in Google Cloud Notebooks.\n", "\n", "5. Enter your project ID in the cell below. Then run the cell to make sure the\n", "Cloud SDK uses the right project for all the commands in this notebook.\n", "\n", "**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "set_project_id" }, "outputs": [], "source": [ "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "autoset_project_id" }, "outputs": [], "source": [ "if PROJECT_ID == \"\" or PROJECT_ID is None or PROJECT_ID == \"[your-project-id]\":\n", " # Get your GCP project id from gcloud\n", " shell_output = !gcloud config list --format 'value(core.project)' 2>/dev/null\n", " PROJECT_ID = shell_output[0]\n", " print(\"Project ID:\", PROJECT_ID)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "set_gcloud_project_id" }, "outputs": [], "source": [ "! gcloud config set project $PROJECT_ID" ] }, { "cell_type": "markdown", "metadata": { "id": "region" }, "source": [ "#### Region\n", "\n", "You can also change the `REGION` variable, which is used for operations\n", "throughout the rest of this notebook. Below are regions supported for Vertex AI. We recommend when possible, to choose the region closest to you.\n", "\n", "- Americas: `us-central1`\n", "- Europe: `europe-west4`\n", "- Asia Pacific: `asia-east1`\n", "\n", "You cannot use a Multi-Regional Storage bucket for training with Vertex. Not all regions provide support for all Vertex services. For the latest support per region, see [Region support for Vertex AI services](https://cloud.google.com/vertex-ai/docs/general/locations)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "region" }, "outputs": [], "source": [ "REGION = \"us-central1\" # @param {type: \"string\"}" ] }, { "cell_type": "markdown", "metadata": { "id": "timestamp" }, "source": [ "#### Timestamp\n", "\n", "If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a timestamp for each instance session, and append onto the name of resources which will be created in this tutorial.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "timestamp" }, "outputs": [], "source": [ "from datetime import datetime\n", "\n", "TIMESTAMP = datetime.now().strftime(\"%Y%m%d%H%M%S\")" ] }, { "cell_type": "markdown", "metadata": { "id": "gcp_authenticate" }, "source": [ "### Authenticate your GCP account\n", "\n", "**If you are using Google Cloud Notebooks**, your environment is already\n", "authenticated. Skip this step.\n", "\n", "*Note: If you are on an Vertex notebook and run the cell, the cell knows to skip executing the authentication steps.*\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "gcp_authenticate" }, "outputs": [], "source": [ "import os\n", "import sys\n", "\n", "# If you are running this notebook in Colab, run this cell and follow the\n", "# instructions to authenticate your Google Cloud account. This provides access\n", "# to your Cloud Storage bucket and lets you submit training jobs and prediction\n", "# requests.\n", "\n", "# If on Vertex, then don't execute this code\n", "if not os.path.exists(\"/opt/deeplearning/metadata/env_version\"):\n", " if \"google.colab\" in sys.modules:\n", " from google.colab import auth as google_auth\n", "\n", " google_auth.authenticate_user()\n", "\n", " # If you are running this tutorial in a notebook locally, replace the string\n", " # below with the path to your service account key and run this cell to\n", " # authenticate your Google Cloud account.\n", " else:\n", " %env GOOGLE_APPLICATION_CREDENTIALS your_path_to_credentials.json\n", "\n", " # Log in to your account on Google Cloud\n", " ! gcloud auth login" ] }, { "cell_type": "markdown", "metadata": { "id": "bucket:batch_prediction" }, "source": [ "### Create a Cloud Storage bucket\n", "\n", "**The following steps are required, regardless of your notebook environment.**\n", "\n", "This tutorial is designed to use training data that is in a public Cloud Storage bucket and a local Cloud Storage bucket for your batch predictions. You may alternatively use your own training data that you have stored in a local Cloud Storage bucket.\n", "\n", "Set the name of your Cloud Storage bucket below. It must be unique across all Cloud Storage buckets.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "bucket" }, "outputs": [], "source": [ "BUCKET_NAME = \"[your-bucket-name]\" # @param {type:\"string\"}" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "autoset_bucket" }, "outputs": [], "source": [ "if BUCKET_NAME == \"\" or BUCKET_NAME is None or BUCKET_NAME == \"[your-bucket-name]\":\n", " BUCKET_NAME = PROJECT_ID + \"aip-\" + TIMESTAMP" ] }, { "cell_type": "markdown", "metadata": { "id": "create_bucket" }, "source": [ "**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "create_bucket" }, "outputs": [], "source": [ "! gsutil mb -l $REGION gs://$BUCKET_NAME" ] }, { "cell_type": "markdown", "metadata": { "id": "validate_bucket" }, "source": [ "Finally, validate access to your Cloud Storage bucket by examining its contents:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "validate_bucket" }, "outputs": [], "source": [ "! gsutil ls -al gs://$BUCKET_NAME" ] }, { "cell_type": "markdown", "metadata": { "id": "setup_vars" }, "source": [ "### Set up variables\n", "\n", "Next, set up some variables used throughout the tutorial.\n", "### Import libraries and define constants\n" ] }, { "cell_type": "markdown", "metadata": { "id": "import_aip" }, "source": [ "#### Import Vertex SDK\n", "\n", "Import the Vertex SDK into our Python environment.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "import_aip" }, "outputs": [], "source": [ "import os\n", "import sys\n", "import time\n", "\n", "from google.cloud.aiplatform import gapic as aip\n", "from google.protobuf.json_format import MessageToJson, ParseDict\n", "from google.protobuf.struct_pb2 import Struct, Value" ] }, { "cell_type": "markdown", "metadata": { "id": "aip_constants" }, "source": [ "#### Vertex AI constants\n", "\n", "Setup up the following constants for Vertex AI:\n", "\n", "- `API_ENDPOINT`: The Vertex AI API service endpoint for dataset, model, job, pipeline and endpoint services.\n", "- `PARENT`: The Vertex AI location root path for dataset, model and endpoint resources.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "aip_constants" }, "outputs": [], "source": [ "# API Endpoint\n", "API_ENDPOINT = \"{}-aiplatform.googleapis.com\".format(REGION)\n", "\n", "# Vertex AI location root path for your dataset, model and endpoint resources\n", "PARENT = \"projects/\" + PROJECT_ID + \"/locations/\" + REGION" ] }, { "cell_type": "markdown", "metadata": { "id": "clients" }, "source": [ "## Clients\n", "\n", "The Vertex SDK works as a client/server model. On your side (the Python script) you will create a client that sends requests and receives responses from the server (Vertex).\n", "\n", "You will use several clients in this tutorial, so set them all up upfront.\n", "\n", "- Model Service for managed models.\n", "- Endpoint Service for deployment.\n", "- Job Service for batch jobs and custom training.\n", "- Prediction Service for serving.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "clients" }, "outputs": [], "source": [ "# client options same for all services\n", "client_options = {\"api_endpoint\": API_ENDPOINT}\n", "\n", "\n", "def create_model_client():\n", " client = aip.ModelServiceClient(client_options=client_options)\n", " return client\n", "\n", "\n", "def create_endpoint_client():\n", " client = aip.EndpointServiceClient(client_options=client_options)\n", " return client\n", "\n", "\n", "def create_prediction_client():\n", " client = aip.PredictionServiceClient(client_options=client_options)\n", " return client\n", "\n", "\n", "def create_job_client():\n", " client = aip.JobServiceClient(client_options=client_options)\n", " return client\n", "\n", "\n", "clients = {}\n", "clients[\"model\"] = create_model_client()\n", "clients[\"endpoint\"] = create_endpoint_client()\n", "clients[\"prediction\"] = create_prediction_client()\n", "clients[\"job\"] = create_job_client()\n", "\n", "for client in clients.items():\n", " print(client)" ] }, { "cell_type": "markdown", "metadata": { "id": "0ce08bfdc2d0" }, "source": [ "## Prepare a trainer script" ] }, { "cell_type": "markdown", "metadata": { "id": "1e930837e6a2" }, "source": [ "### Package assembly" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "4f1f6164f0fa" }, "outputs": [], "source": [ "# Make folder for python training script\n", "! rm -rf custom\n", "! mkdir custom\n", "\n", "# Add package information\n", "! touch custom/README.md\n", "\n", "setup_cfg = \"[egg_info]\\n\\\n", "tag_build =\\n\\\n", "tag_date = 0\"\n", "! echo \"$setup_cfg\" > custom/setup.cfg\n", "\n", "setup_py = \"import setuptools\\n\\\n", "setuptools.setup(\\n\\\n", " install_requires=[\\n\\\n", " ],\\n\\\n", " packages=setuptools.find_packages())\"\n", "! echo \"$setup_py\" > custom/setup.py\n", "\n", "pkg_info = \"Metadata-Version: 1.0\\n\\\n", "Name: Custom Census Income\\n\\\n", "Version: 0.0.0\\n\\\n", "Summary: Demonstration training script\\n\\\n", "Home-page: www.google.com\\n\\\n", "Author: Google\\n\\\n", "Author-email: aferlitsch@google.com\\n\\\n", "License: Public\\n\\\n", "Description: Demo\\n\\\n", "Platform: Vertex AI\"\n", "! echo \"$pkg_info\" > custom/PKG-INFO\n", "\n", "# Make the training subfolder\n", "! mkdir custom/trainer\n", "! touch custom/trainer/__init__.py" ] }, { "cell_type": "markdown", "metadata": { "id": "40d7d11f7de7" }, "source": [ "### Task.py contents" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "96d9bd80ddda" }, "outputs": [], "source": [ "%%writefile custom/trainer/task.py\n", "# Single Instance Training for Census Income\n", "\n", "from sklearn.ensemble import RandomForestClassifier\n", "import joblib\n", "from sklearn.feature_selection import SelectKBest\n", "from sklearn.pipeline import FeatureUnion\n", "from sklearn.pipeline import Pipeline\n", "from sklearn.preprocessing import LabelBinarizer\n", "import datetime\n", "import pandas as pd\n", "\n", "from google.cloud import storage\n", "\n", "import numpy as np\n", "import argparse\n", "import os\n", "import sys\n", "\n", "parser = argparse.ArgumentParser()\n", "parser.add_argument('--model-dir', dest='model_dir',\n", " default=os.getenv('AIP_MODEL_DIR'), type=str, help='Model dir.')\n", "args = parser.parse_args()\n", "\n", "print('Python Version = {}'.format(sys.version))\n", "\n", "# Public bucket holding the census data\n", "bucket = storage.Client().bucket('cloud-samples-data')\n", "\n", "# Path to the data inside the public bucket\n", "blob = bucket.blob('ai-platform/sklearn/census_data/adult.data')\n", "# Download the data\n", "blob.download_to_filename('adult.data')\n", "\n", "# Define the format of your input data including unused columns (These are the columns from the census data files)\n", "COLUMNS = (\n", " 'age',\n", " 'workclass',\n", " 'fnlwgt',\n", " 'education',\n", " 'education-num',\n", " 'marital-status',\n", " 'occupation',\n", " 'relationship',\n", " 'race',\n", " 'sex',\n", " 'capital-gain',\n", " 'capital-loss',\n", " 'hours-per-week',\n", " 'native-country',\n", " 'income-level'\n", ")\n", "\n", "# Categorical columns are columns that need to be turned into a numerical value to be used by scikit-learn\n", "CATEGORICAL_COLUMNS = (\n", " 'workclass',\n", " 'education',\n", " 'marital-status',\n", " 'occupation',\n", " 'relationship',\n", " 'race',\n", " 'sex',\n", " 'native-country'\n", ")\n", "\n", "\n", "# Load the training census dataset\n", "with open('./adult.data', 'r') as train_data:\n", " raw_training_data = pd.read_csv(train_data, header=None, names=COLUMNS)\n", "\n", "# Remove the column we are trying to predict ('income-level') from our features list\n", "# Convert the Dataframe to a lists of lists\n", "train_features = raw_training_data.drop('income-level', axis=1).values.tolist()\n", "# Create our training labels list, convert the Dataframe to a lists of lists\n", "train_labels = (raw_training_data['income-level'] == ' >50K').values.tolist()\n", "\n", "# Since the census data set has categorical features, we need to convert\n", "# them to numerical values. We'll use a list of pipelines to convert each\n", "# categorical column and then use FeatureUnion to combine them before calling\n", "# the RandomForestClassifier.\n", "categorical_pipelines = []\n", "\n", "# Each categorical column needs to be extracted individually and converted to a numerical value.\n", "# To do this, each categorical column will use a pipeline that extracts one feature column via\n", "# SelectKBest(k=1) and a LabelBinarizer() to convert the categorical value to a numerical one.\n", "# A scores array (created below) will select and extract the feature column. The scores array is\n", "# created by iterating over the COLUMNS and checking if it is a CATEGORICAL_COLUMN.\n", "for i, col in enumerate(COLUMNS[:-1]):\n", " if col in CATEGORICAL_COLUMNS:\n", " # Create a scores array to get the individual categorical column.\n", " # Example:\n", " # data = [39, 'State-gov', 77516, 'Bachelors', 13, 'Never-married', 'Adm-clerical',\n", " # 'Not-in-family', 'White', 'Male', 2174, 0, 40, 'United-States']\n", " # scores = [0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]\n", " #\n", " # Returns: [['State-gov']]\n", " # Build the scores array.\n", " scores = [0] * len(COLUMNS[:-1])\n", " # This column is the categorical column we want to extract.\n", " scores[i] = 1\n", " skb = SelectKBest(k=1)\n", " skb.scores_ = scores\n", " # Convert the categorical column to a numerical value\n", " lbn = LabelBinarizer()\n", " r = skb.transform(train_features)\n", " lbn.fit(r)\n", " # Create the pipeline to extract the categorical feature\n", " categorical_pipelines.append(\n", " ('categorical-{}'.format(i), Pipeline([\n", " ('SKB-{}'.format(i), skb),\n", " ('LBN-{}'.format(i), lbn)])))\n", " \n", "# Create pipeline to extract the numerical features\n", "skb = SelectKBest(k=6)\n", "# From COLUMNS use the features that are numerical\n", "skb.scores_ = [1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0]\n", "categorical_pipelines.append(('numerical', skb))\n", "\n", "# Combine all the features using FeatureUnion\n", "preprocess = FeatureUnion(categorical_pipelines)\n", "\n", "# Create the classifier\n", "classifier = RandomForestClassifier()\n", "\n", "# Transform the features and fit them to the classifier\n", "classifier.fit(preprocess.transform(train_features), train_labels)\n", "\n", "# Create the overall model as a single pipeline\n", "pipeline = Pipeline([\n", " ('union', preprocess),\n", " ('classifier', classifier)\n", "])\n", "\n", "# Split path into bucket and subdirectory\n", "bucket = args.model_dir.split('/')[2]\n", "subdir = args.model_dir.split('/')[-1]\n", "\n", "# Write model to a local file\n", "joblib.dump(pipeline, 'model.joblib')\n", "\n", "# Upload the model to GCS\n", "bucket = storage.Client().bucket(bucket)\n", "blob = bucket.blob(subdir + '/model.joblib')\n", "blob.upload_from_filename('model.joblib')\n" ] }, { "cell_type": "markdown", "metadata": { "id": "3022bce50fbf" }, "source": [ "### Store training script on your Cloud Storage bucket" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "13409553e08b" }, "outputs": [], "source": [ "! rm -f custom.tar custom.tar.gz\n", "! tar cvf custom.tar custom\n", "! gzip custom.tar\n", "! gsutil cp custom.tar.gz gs://$BUCKET_NAME/census.tar.gz" ] }, { "cell_type": "markdown", "metadata": { "id": "text_create_and_deploy_model:migration" }, "source": [ "## Train a model" ] }, { "cell_type": "markdown", "metadata": { "id": "0oqIBOSnJjkW" }, "source": [ "### [projects.locations.customJobs.create](https://cloud.google.com/vertex-ai/docs/reference/rest/v1beta1/projects.locations.trainingPipelines/create)" ] }, { "cell_type": "markdown", "metadata": { "id": "e110f8131d32" }, "source": [ "#### Request" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "3c7288151426" }, "outputs": [], "source": [ "TRAIN_IMAGE = \"gcr.io/cloud-aiplatform/training/scikit-learn-cpu.0-23:latest\"\n", "\n", "JOB_NAME = \"custom_job_SKL\" + TIMESTAMP\n", "\n", "WORKER_POOL_SPEC = [\n", " {\n", " \"replica_count\": 1,\n", " \"machine_spec\": {\"machine_type\": \"n1-standard-4\"},\n", " \"python_package_spec\": {\n", " \"executor_image_uri\": TRAIN_IMAGE,\n", " \"package_uris\": [\"gs://\" + BUCKET_NAME + \"/census.tar.gz\"],\n", " \"python_module\": \"trainer.task\",\n", " \"args\": [\"--model-dir=\" + \"gs://{}/{}\".format(BUCKET_NAME, JOB_NAME)],\n", " },\n", " }\n", "]\n", "\n", "training_job = aip.CustomJob(\n", " display_name=JOB_NAME, job_spec={\"worker_pool_specs\": WORKER_POOL_SPEC}\n", ")\n", "\n", "print(\n", " MessageToJson(\n", " aip.CreateCustomJobRequest(parent=PARENT, custom_job=training_job).__dict__[\n", " \"_pb\"\n", " ]\n", " )\n", ")" ] }, { "cell_type": "markdown", "metadata": { "id": "datasets_import:migration,new,request" }, "source": [ "*Example output*:\n", "```\n", "{\n", " \"parent\": \"projects/migration-ucaip-training/locations/us-central1\",\n", " \"customJob\": {\n", " \"displayName\": \"custom_job_SKL20210323185534\",\n", " \"jobSpec\": {\n", " \"workerPoolSpecs\": [\n", " {\n", " \"machineSpec\": {\n", " \"machineType\": \"n1-standard-4\"\n", " },\n", " \"replicaCount\": \"1\",\n", " \"pythonPackageSpec\": {\n", " \"executorImageUri\": \"gcr.io/cloud-aiplatform/training/scikit-learn-cpu.0-23:latest\",\n", " \"packageUris\": [\n", " \"gs://migration-ucaip-trainingaip-20210323185534/census.tar.gz\"\n", " ],\n", " \"pythonModule\": \"trainer.task\",\n", " \"args\": [\n", " \"--model-dir=gs://migration-ucaip-trainingaip-20210323185534/custom_job_SKL20210323185534\"\n", " ]\n", " }\n", " }\n", " ]\n", " }\n", " }\n", "}\n", "```\n" ] }, { "cell_type": "markdown", "metadata": { "id": "1fcd4e82a52b" }, "source": [ "#### Call" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "fbe59127c6f6" }, "outputs": [], "source": [ "request = clients[\"job\"].create_custom_job(parent=PARENT, custom_job=training_job)" ] }, { "cell_type": "markdown", "metadata": { "id": "3dffa1c62454" }, "source": [ "#### Response" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "4086d1e46b00" }, "outputs": [], "source": [ "print(MessageToJson(request.__dict__[\"_pb\"]))" ] }, { "cell_type": "markdown", "metadata": { "id": "datasets_import:migration,new,request" }, "source": [ "*Example output*:\n", "```\n", "{\n", " \"name\": \"projects/116273516712/locations/us-central1/customJobs/3216493723709865984\",\n", " \"displayName\": \"custom_job_SKL20210323185534\",\n", " \"jobSpec\": {\n", " \"workerPoolSpecs\": [\n", " {\n", " \"machineSpec\": {\n", " \"machineType\": \"n1-standard-4\"\n", " },\n", " \"replicaCount\": \"1\",\n", " \"diskSpec\": {\n", " \"bootDiskType\": \"pd-ssd\",\n", " \"bootDiskSizeGb\": 100\n", " },\n", " \"pythonPackageSpec\": {\n", " \"executorImageUri\": \"gcr.io/cloud-aiplatform/training/scikit-learn-cpu.0-23:latest\",\n", " \"packageUris\": [\n", " \"gs://migration-ucaip-trainingaip-20210323185534/census.tar.gz\"\n", " ],\n", " \"pythonModule\": \"trainer.task\",\n", " \"args\": [\n", " \"--model-dir=gs://migration-ucaip-trainingaip-20210323185534/custom_job_SKL20210323185534\"\n", " ]\n", " }\n", " }\n", " ]\n", " },\n", " \"state\": \"JOB_STATE_PENDING\",\n", " \"createTime\": \"2021-03-23T18:55:41.688375Z\",\n", " \"updateTime\": \"2021-03-23T18:55:41.688375Z\"\n", "}\n", "```\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "training_pipeline_id:migration,new,response" }, "outputs": [], "source": [ "# The full unique ID for the custom training job\n", "custom_training_id = request.name\n", "# The short numeric ID for the custom training job\n", "custom_training_short_id = custom_training_id.split(\"/\")[-1]\n", "\n", "print(custom_training_id)" ] }, { "cell_type": "markdown", "metadata": { "id": "0oqIBOSnJjkW" }, "source": [ "### [projects.locations.customJobs.get](https://cloud.google.com/vertex-ai/docs/reference/rest/v1beta1/projects.locations.trainingPipelines/get)" ] }, { "cell_type": "markdown", "metadata": { "id": "dd8e5e3427d5" }, "source": [ "#### Call" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "734b3788cff1" }, "outputs": [], "source": [ "request = clients[\"job\"].get_custom_job(name=custom_training_id)" ] }, { "cell_type": "markdown", "metadata": { "id": "f145335bc684" }, "source": [ "#### Response" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "30eef648d2ec" }, "outputs": [], "source": [ "print(MessageToJson(request.__dict__[\"_pb\"]))" ] }, { "cell_type": "markdown", "metadata": { "id": "datasets_import:migration,new,request" }, "source": [ "*Example output*:\n", "```\n", "{\n", " \"name\": \"projects/116273516712/locations/us-central1/customJobs/3216493723709865984\",\n", " \"displayName\": \"custom_job_SKL20210323185534\",\n", " \"jobSpec\": {\n", " \"workerPoolSpecs\": [\n", " {\n", " \"machineSpec\": {\n", " \"machineType\": \"n1-standard-4\"\n", " },\n", " \"replicaCount\": \"1\",\n", " \"diskSpec\": {\n", " \"bootDiskType\": \"pd-ssd\",\n", " \"bootDiskSizeGb\": 100\n", " },\n", " \"pythonPackageSpec\": {\n", " \"executorImageUri\": \"gcr.io/cloud-aiplatform/training/scikit-learn-cpu.0-23:latest\",\n", " \"packageUris\": [\n", " \"gs://migration-ucaip-trainingaip-20210323185534/census.tar.gz\"\n", " ],\n", " \"pythonModule\": \"trainer.task\",\n", " \"args\": [\n", " \"--model-dir=gs://migration-ucaip-trainingaip-20210323185534/custom_job_SKL20210323185534\"\n", " ]\n", " }\n", " }\n", " ]\n", " },\n", " \"state\": \"JOB_STATE_PENDING\",\n", " \"createTime\": \"2021-03-23T18:55:41.688375Z\",\n", " \"updateTime\": \"2021-03-23T18:55:41.688375Z\"\n", "}\n", "```\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "trainingpipelines_get:migration,new,wait" }, "outputs": [], "source": [ "while True:\n", " response = clients[\"job\"].get_custom_job(name=custom_training_id)\n", " if response.state != aip.PipelineState.PIPELINE_STATE_SUCCEEDED:\n", " print(\"Training job has not completed:\", response.state)\n", " if response.state == aip.PipelineState.PIPELINE_STATE_FAILED:\n", " break\n", " else:\n", " print(\"Training Time:\", response.end_time - response.start_time)\n", " break\n", " time.sleep(60)\n", "\n", "# model artifact output directory on Google Cloud Storage\n", "model_artifact_dir = (\n", " response.job_spec.worker_pool_specs[0].python_package_spec.args[0].split(\"=\")[-1]\n", ")\n", "print(\"artifact location \" + model_artifact_dir)" ] }, { "cell_type": "markdown", "metadata": { "id": "576a7e7ce36c" }, "source": [ "## Deploy the model" ] }, { "cell_type": "markdown", "metadata": { "id": "COwVZtxhJjkW" }, "source": [ "### [projects.locations.models.upload](https://cloud.google.com/vertex-ai/docs/reference/rest/v1beta1/projects.locations.models/upload)" ] }, { "cell_type": "markdown", "metadata": { "id": "fed9fd1f70cf" }, "source": [ "#### Request" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "aedf59150ec9" }, "outputs": [], "source": [ "DEPLOY_IMAGE = \"gcr.io/cloud-aiplatform/prediction/sklearn-cpu.0-23:latest\"\n", "\n", "model = {\n", " \"display_name\": \"custom_job_SKL\" + TIMESTAMP,\n", " \"artifact_uri\": model_artifact_dir,\n", " \"container_spec\": {\"image_uri\": DEPLOY_IMAGE, \"ports\": [{\"container_port\": 8080}]},\n", "}\n", "\n", "print(MessageToJson(aip.UploadModelRequest(parent=PARENT, model=model).__dict__[\"_pb\"]))" ] }, { "cell_type": "markdown", "metadata": { "id": "datasets_import:migration,new,request" }, "source": [ "*Example output*:\n", "```\n", "{\n", " \"parent\": \"projects/migration-ucaip-training/locations/us-central1\",\n", " \"model\": {\n", " \"displayName\": \"custom_job_SKL20210323185534\",\n", " \"containerSpec\": {\n", " \"imageUri\": \"gcr.io/cloud-aiplatform/prediction/sklearn-cpu.0-23:latest\",\n", " \"ports\": [\n", " {\n", " \"containerPort\": 8080\n", " }\n", " ]\n", " },\n", " \"artifactUri\": \"gs://migration-ucaip-trainingaip-20210323185534/custom_job_SKL20210323185534\"\n", " }\n", "}\n", "```\n" ] }, { "cell_type": "markdown", "metadata": { "id": "8641495cc6f9" }, "source": [ "#### Call" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "5a967bc5db1b" }, "outputs": [], "source": [ "request = clients[\"model\"].upload_model(parent=PARENT, model=model)" ] }, { "cell_type": "markdown", "metadata": { "id": "2118f3c18d0f" }, "source": [ "#### Response" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "1de1269f8216" }, "outputs": [], "source": [ "result = request.result()\n", "\n", "print(MessageToJson(result.__dict__[\"_pb\"]))" ] }, { "cell_type": "markdown", "metadata": { "id": "datasets_import:migration,new,request" }, "source": [ "*Example output*:\n", "```\n", "{\n", " \"model\": \"projects/116273516712/locations/us-central1/models/5984808915752189952\"\n", "}\n", "```\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "0b2a551124e1" }, "outputs": [], "source": [ "model_id = result.model" ] }, { "cell_type": "markdown", "metadata": { "id": "make_batch_predictions:migration" }, "source": [ "## Make batch predictions\n" ] }, { "cell_type": "markdown", "metadata": { "id": "make_batch_prediction_file:migration,new" }, "source": [ "### Make a batch prediction file\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "get_test_items:automl,icn,csv" }, "outputs": [], "source": [ "import json\n", "\n", "import tensorflow as tf\n", "\n", "INSTANCES = [\n", " [\n", " 25,\n", " \"Private\",\n", " 226802,\n", " \"11th\",\n", " 7,\n", " \"Never-married\",\n", " \"Machine-op-inspct\",\n", " \"Own-child\",\n", " \"Black\",\n", " \"Male\",\n", " 0,\n", " 0,\n", " 40,\n", " \"United-States\",\n", " ],\n", " [\n", " 38,\n", " \"Private\",\n", " 89814,\n", " \"HS-grad\",\n", " 9,\n", " \"Married-civ-spouse\",\n", " \"Farming-fishing\",\n", " \"Husband\",\n", " \"White\",\n", " \"Male\",\n", " 0,\n", " 0,\n", " 50,\n", " \"United-States\",\n", " ],\n", " [\n", " 28,\n", " \"Local-gov\",\n", " 336951,\n", " \"Assoc-acdm\",\n", " 12,\n", " \"Married-civ-spouse\",\n", " \"Protective-serv\",\n", " \"Husband\",\n", " \"White\",\n", " \"Male\",\n", " 0,\n", " 0,\n", " 40,\n", " \"United-States\",\n", " ],\n", " [\n", " 44,\n", " \"Private\",\n", " 160323,\n", " \"Some-college\",\n", " 10,\n", " \"Married-civ-spouse\",\n", " \"Machine-op-inspct\",\n", " \"Husband\",\n", " \"Black\",\n", " \"Male\",\n", " 7688,\n", " 0,\n", " 40,\n", " \"United-States\",\n", " ],\n", " [\n", " 18,\n", " \"?\",\n", " 103497,\n", " \"Some-college\",\n", " 10,\n", " \"Never-married\",\n", " \"?\",\n", " \"Own-child\",\n", " \"White\",\n", " \"Female\",\n", " 0,\n", " 0,\n", " 30,\n", " \"United-States\",\n", " ],\n", " [\n", " 34,\n", " \"Private\",\n", " 198693,\n", " \"10th\",\n", " 6,\n", " \"Never-married\",\n", " \"Other-service\",\n", " \"Not-in-family\",\n", " \"White\",\n", " \"Male\",\n", " 0,\n", " 0,\n", " 30,\n", " \"United-States\",\n", " ],\n", " [\n", " 29,\n", " \"?\",\n", " 227026,\n", " \"HS-grad\",\n", " 9,\n", " \"Never-married\",\n", " \"?\",\n", " \"Unmarried\",\n", " \"Black\",\n", " \"Male\",\n", " 0,\n", " 0,\n", " 40,\n", " \"United-States\",\n", " ],\n", " [\n", " 63,\n", " \"Self-emp-not-inc\",\n", " 104626,\n", " \"Prof-school\",\n", " 15,\n", " \"Married-civ-spouse\",\n", " \"Prof-specialty\",\n", " \"Husband\",\n", " \"White\",\n", " \"Male\",\n", " 3103,\n", " 0,\n", " 32,\n", " \"United-States\",\n", " ],\n", " [\n", " 24,\n", " \"Private\",\n", " 369667,\n", " \"Some-college\",\n", " 10,\n", " \"Never-married\",\n", " \"Other-service\",\n", " \"Unmarried\",\n", " \"White\",\n", " \"Female\",\n", " 0,\n", " 0,\n", " 40,\n", " \"United-States\",\n", " ],\n", " [\n", " 55,\n", " \"Private\",\n", " 104996,\n", " \"7th-8th\",\n", " 4,\n", " \"Married-civ-spouse\",\n", " \"Craft-repair\",\n", " \"Husband\",\n", " \"White\",\n", " \"Male\",\n", " 0,\n", " 0,\n", " 10,\n", " \"United-States\",\n", " ],\n", "]\n", "\n", "gcs_input_uri = \"gs://\" + BUCKET_NAME + \"/\" + \"test.jsonl\"\n", "with tf.io.gfile.GFile(gcs_input_uri, \"w\") as f:\n", " for i in INSTANCES:\n", " f.write(json.dumps(i) + \"\\n\")\n", "\n", "! gsutil cat $gcs_input_uri" ] }, { "cell_type": "markdown", "metadata": { "id": "datasets_import:migration,new,request" }, "source": [ "*Example output*:\n", "```\n", "[25, \"Private\", 226802, \"11th\", 7, \"Never-married\", \"Machine-op-inspct\", \"Own-child\", \"Black\", \"Male\", 0, 0, 40, \"United-States\"]\n", "[38, \"Private\", 89814, \"HS-grad\", 9, \"Married-civ-spouse\", \"Farming-fishing\", \"Husband\", \"White\", \"Male\", 0, 0, 50, \"United-States\"]\n", "[28, \"Local-gov\", 336951, \"Assoc-acdm\", 12, \"Married-civ-spouse\", \"Protective-serv\", \"Husband\", \"White\", \"Male\", 0, 0, 40, \"United-States\"]\n", "[44, \"Private\", 160323, \"Some-college\", 10, \"Married-civ-spouse\", \"Machine-op-inspct\", \"Husband\", \"Black\", \"Male\", 7688, 0, 40, \"United-States\"]\n", "[18, \"?\", 103497, \"Some-college\", 10, \"Never-married\", \"?\", \"Own-child\", \"White\", \"Female\", 0, 0, 30, \"United-States\"]\n", "[34, \"Private\", 198693, \"10th\", 6, \"Never-married\", \"Other-service\", \"Not-in-family\", \"White\", \"Male\", 0, 0, 30, \"United-States\"]\n", "[29, \"?\", 227026, \"HS-grad\", 9, \"Never-married\", \"?\", \"Unmarried\", \"Black\", \"Male\", 0, 0, 40, \"United-States\"]\n", "[63, \"Self-emp-not-inc\", 104626, \"Prof-school\", 15, \"Married-civ-spouse\", \"Prof-specialty\", \"Husband\", \"White\", \"Male\", 3103, 0, 32, \"United-States\"]\n", "[24, \"Private\", 369667, \"Some-college\", 10, \"Never-married\", \"Other-service\", \"Unmarried\", \"White\", \"Female\", 0, 0, 40, \"United-States\"]\n", "[55, \"Private\", 104996, \"7th-8th\", 4, \"Married-civ-spouse\", \"Craft-repair\", \"Husband\", \"White\", \"Male\", 0, 0, 10, \"United-States\"]\n", "```\n" ] }, { "cell_type": "markdown", "metadata": { "id": "batchpredictionjobs_create:migration,new" }, "source": [ "### [projects.locations.batchPredictionJobs.create](https://cloud.google.com/vertex-ai/docs/reference/rest/v1beta1/projects.locations.batchPredictionJobs/create)\n" ] }, { "cell_type": "markdown", "metadata": { "id": "request:migration" }, "source": [ "#### Request\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "batchpredictionjobs_create:migration,new,request,icn" }, "outputs": [], "source": [ "model_parameters = Value(\n", " struct_value=Struct(\n", " fields={\n", " \"confidence_threshold\": Value(number_value=0.5),\n", " \"max_predictions\": Value(number_value=10000.0),\n", " }\n", " )\n", ")\n", "\n", "batch_prediction_job = {\n", " \"display_name\": \"custom_job_SKL\" + TIMESTAMP,\n", " \"model\": model_id,\n", " \"input_config\": {\n", " \"instances_format\": \"jsonl\",\n", " \"gcs_source\": {\"uris\": [gcs_input_uri]},\n", " },\n", " \"model_parameters\": model_parameters,\n", " \"output_config\": {\n", " \"predictions_format\": \"jsonl\",\n", " \"gcs_destination\": {\n", " \"output_uri_prefix\": \"gs://\" + f\"{BUCKET_NAME}/batch_output/\"\n", " },\n", " },\n", " \"dedicated_resources\": {\n", " \"machine_spec\": {\"machine_type\": \"n1-standard-2\"},\n", " \"starting_replica_count\": 1,\n", " \"max_replica_count\": 1,\n", " },\n", "}\n", "\n", "print(\n", " MessageToJson(\n", " aip.CreateBatchPredictionJobRequest(\n", " parent=PARENT, batch_prediction_job=batch_prediction_job\n", " ).__dict__[\"_pb\"]\n", " )\n", ")" ] }, { "cell_type": "markdown", "metadata": { "id": "batchpredictionjobs_create:migration,new,request,icn" }, "source": [ "*Example output*:\n", "```\n", "{\n", " \"parent\": \"projects/migration-ucaip-training/locations/us-central1\",\n", " \"batchPredictionJob\": {\n", " \"displayName\": \"custom_job_SKL20210323185534\",\n", " \"model\": \"projects/116273516712/locations/us-central1/models/5984808915752189952\",\n", " \"inputConfig\": {\n", " \"instancesFormat\": \"jsonl\",\n", " \"gcsSource\": {\n", " \"uris\": [\n", " \"gs://migration-ucaip-trainingaip-20210323185534/test.jsonl\"\n", " ]\n", " }\n", " },\n", " \"modelParameters\": {\n", " \"confidence_threshold\": 0.5,\n", " \"max_predictions\": 10000.0\n", " },\n", " \"outputConfig\": {\n", " \"predictionsFormat\": \"jsonl\",\n", " \"gcsDestination\": {\n", " \"outputUriPrefix\": \"gs://migration-ucaip-trainingaip-20210323185534/batch_output/\"\n", " }\n", " },\n", " \"dedicatedResources\": {\n", " \"machineSpec\": {\n", " \"machineType\": \"n1-standard-2\"\n", " },\n", " \"startingReplicaCount\": 1,\n", " \"maxReplicaCount\": 1\n", " }\n", " }\n", "}\n", "```\n" ] }, { "cell_type": "markdown", "metadata": { "id": "call:migration" }, "source": [ "#### Call\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "batchpredictionjobs_create:migration,new,call" }, "outputs": [], "source": [ "request = clients[\"job\"].create_batch_prediction_job(\n", " parent=PARENT, batch_prediction_job=batch_prediction_job\n", ")" ] }, { "cell_type": "markdown", "metadata": { "id": "response:migration" }, "source": [ "#### Response\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "print:migration,new,request" }, "outputs": [], "source": [ "print(MessageToJson(request.__dict__[\"_pb\"]))" ] }, { "cell_type": "markdown", "metadata": { "id": "batchpredictionjobs_create:migration,new,response,icn" }, "source": [ "*Example output*:\n", "```\n", "{\n", " \"name\": \"projects/116273516712/locations/us-central1/batchPredictionJobs/2509428582212698112\",\n", " \"displayName\": \"custom_job_SKL20210323185534\",\n", " \"model\": \"projects/116273516712/locations/us-central1/models/5984808915752189952\",\n", " \"inputConfig\": {\n", " \"instancesFormat\": \"jsonl\",\n", " \"gcsSource\": {\n", " \"uris\": [\n", " \"gs://migration-ucaip-trainingaip-20210323185534/test.jsonl\"\n", " ]\n", " }\n", " },\n", " \"modelParameters\": {\n", " \"max_predictions\": 10000.0,\n", " \"confidence_threshold\": 0.5\n", " },\n", " \"outputConfig\": {\n", " \"predictionsFormat\": \"jsonl\",\n", " \"gcsDestination\": {\n", " \"outputUriPrefix\": \"gs://migration-ucaip-trainingaip-20210323185534/batch_output/\"\n", " }\n", " },\n", " \"dedicatedResources\": {\n", " \"machineSpec\": {\n", " \"machineType\": \"n1-standard-2\"\n", " },\n", " \"startingReplicaCount\": 1,\n", " \"maxReplicaCount\": 1\n", " },\n", " \"manualBatchTuningParameters\": {},\n", " \"state\": \"JOB_STATE_PENDING\",\n", " \"createTime\": \"2021-03-23T19:05:07.344290Z\",\n", " \"updateTime\": \"2021-03-23T19:05:07.344290Z\"\n", "}\n", "```\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "batch_job_id:migration,new,response" }, "outputs": [], "source": [ "# The fully qualified ID for the batch job\n", "batch_job_id = request.name\n", "# The short numeric ID for the batch job\n", "batch_job_short_id = batch_job_id.split(\"/\")[-1]\n", "\n", "print(batch_job_id)" ] }, { "cell_type": "markdown", "metadata": { "id": "batchpredictionjobs_get:migration,new" }, "source": [ "### [projects.locations.batchPredictionJobs.get](https://cloud.google.com/vertex-ai/docs/reference/rest/v1beta1/projects.locations.batchPredictionJobs/get)\n" ] }, { "cell_type": "markdown", "metadata": { "id": "call:migration" }, "source": [ "#### Call\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "batchpredictionjobs_get:migration,new,call" }, "outputs": [], "source": [ "request = clients[\"job\"].get_batch_prediction_job(name=batch_job_id)" ] }, { "cell_type": "markdown", "metadata": { "id": "response:migration" }, "source": [ "#### Response\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "print:migration,new,request" }, "outputs": [], "source": [ "print(MessageToJson(request.__dict__[\"_pb\"]))" ] }, { "cell_type": "markdown", "metadata": { "id": "batchpredictionjobs_get:migration,new,response,icn" }, "source": [ "*Example output*:\n", "```\n", "{\n", " \"name\": \"projects/116273516712/locations/us-central1/batchPredictionJobs/2509428582212698112\",\n", " \"displayName\": \"custom_job_SKL20210323185534\",\n", " \"model\": \"projects/116273516712/locations/us-central1/models/5984808915752189952\",\n", " \"inputConfig\": {\n", " \"instancesFormat\": \"jsonl\",\n", " \"gcsSource\": {\n", " \"uris\": [\n", " \"gs://migration-ucaip-trainingaip-20210323185534/test.jsonl\"\n", " ]\n", " }\n", " },\n", " \"modelParameters\": {\n", " \"confidence_threshold\": 0.5,\n", " \"max_predictions\": 10000.0\n", " },\n", " \"outputConfig\": {\n", " \"predictionsFormat\": \"jsonl\",\n", " \"gcsDestination\": {\n", " \"outputUriPrefix\": \"gs://migration-ucaip-trainingaip-20210323185534/batch_output/\"\n", " }\n", " },\n", " \"dedicatedResources\": {\n", " \"machineSpec\": {\n", " \"machineType\": \"n1-standard-2\"\n", " },\n", " \"startingReplicaCount\": 1,\n", " \"maxReplicaCount\": 1\n", " },\n", " \"manualBatchTuningParameters\": {},\n", " \"state\": \"JOB_STATE_PENDING\",\n", " \"createTime\": \"2021-03-23T19:05:07.344290Z\",\n", " \"updateTime\": \"2021-03-23T19:05:07.344290Z\"\n", "}\n", "```\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "batchpredictionjobs_get:migration,new,wait" }, "outputs": [], "source": [ "def get_latest_predictions(gcs_out_dir):\n", " \"\"\" Get the latest prediction subfolder using the timestamp in the subfolder name\"\"\"\n", " folders = !gsutil ls $gcs_out_dir\n", " latest = \"\"\n", " for folder in folders:\n", " subfolder = folder.split(\"/\")[-2]\n", " if subfolder.startswith(\"prediction-\"):\n", " if subfolder > latest:\n", " latest = folder[:-1]\n", " return latest\n", "\n", "\n", "while True:\n", " response = clients[\"job\"].get_batch_prediction_job(name=batch_job_id)\n", " if response.state != aip.JobState.JOB_STATE_SUCCEEDED:\n", " print(\"The job has not completed:\", response.state)\n", " if response.state == aip.JobState.JOB_STATE_FAILED:\n", " break\n", " else:\n", " folder = get_latest_predictions(\n", " response.output_config.gcs_destination.output_uri_prefix\n", " )\n", " ! gsutil ls $folder/prediction*\n", "\n", " ! gsutil cat -h $folder/prediction*\n", " break\n", " time.sleep(60)" ] }, { "cell_type": "markdown", "metadata": { "id": "batchpredictionjobs_get:migration,new,wait,icn" }, "source": [ "*Example output*:\n", "```\n", "==> gs://migration-ucaip-trainingaip-20210323185534/batch_output/prediction-custom_job_SKL20210323185534-2021_03_23T12_05_07_282Z/prediction.errors_stats-00000-of-00001 <==\n", "\n", "==> gs://migration-ucaip-trainingaip-20210323185534/batch_output/prediction-custom_job_SKL20210323185534-2021_03_23T12_05_07_282Z/prediction.results-00000-of-00001 <==\n", "{\"instance\": [25, \"Private\", 226802, \"11th\", 7, \"Never-married\", \"Machine-op-inspct\", \"Own-child\", \"Black\", \"Male\", 0, 0, 40, \"United-States\"], \"prediction\": false}\n", "{\"instance\": [38, \"Private\", 89814, \"HS-grad\", 9, \"Married-civ-spouse\", \"Farming-fishing\", \"Husband\", \"White\", \"Male\", 0, 0, 50, \"United-States\"], \"prediction\": false}\n", "{\"instance\": [28, \"Local-gov\", 336951, \"Assoc-acdm\", 12, \"Married-civ-spouse\", \"Protective-serv\", \"Husband\", \"White\", \"Male\", 0, 0, 40, \"United-States\"], \"prediction\": false}\n", "{\"instance\": [44, \"Private\", 160323, \"Some-college\", 10, \"Married-civ-spouse\", \"Machine-op-inspct\", \"Husband\", \"Black\", \"Male\", 7688, 0, 40, \"United-States\"], \"prediction\": true}\n", "{\"instance\": [18, \"?\", 103497, \"Some-college\", 10, \"Never-married\", \"?\", \"Own-child\", \"White\", \"Female\", 0, 0, 30, \"United-States\"], \"prediction\": false}\n", "{\"instance\": [34, \"Private\", 198693, \"10th\", 6, \"Never-married\", \"Other-service\", \"Not-in-family\", \"White\", \"Male\", 0, 0, 30, \"United-States\"], \"prediction\": false}\n", "{\"instance\": [29, \"?\", 227026, \"HS-grad\", 9, \"Never-married\", \"?\", \"Unmarried\", \"Black\", \"Male\", 0, 0, 40, \"United-States\"], \"prediction\": false}\n", "{\"instance\": [63, \"Self-emp-not-inc\", 104626, \"Prof-school\", 15, \"Married-civ-spouse\", \"Prof-specialty\", \"Husband\", \"White\", \"Male\", 3103, 0, 32, \"United-States\"], \"prediction\": false}\n", "{\"instance\": [24, \"Private\", 369667, \"Some-college\", 10, \"Never-married\", \"Other-service\", \"Unmarried\", \"White\", \"Female\", 0, 0, 40, \"United-States\"], \"prediction\": false}\n", "{\"instance\": [55, \"Private\", 104996, \"7th-8th\", 4, \"Married-civ-spouse\", \"Craft-repair\", \"Husband\", \"White\", \"Male\", 0, 0, 10, \"United-States\"], \"prediction\": false}\n", "```\n" ] }, { "cell_type": "markdown", "metadata": { "id": "be2ec9a417b1" }, "source": [ "## Make online predictions" ] }, { "cell_type": "markdown", "metadata": { "id": "endpoints_create:migration,new" }, "source": [ "### [projects.locations.endpoints.create](https://cloud.google.com/vertex-ai/docs/reference/rest/v1beta1/projects.locations.endpoints/create)\n" ] }, { "cell_type": "markdown", "metadata": { "id": "request:migration" }, "source": [ "#### Request\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "endpoints_create:migration,new,request" }, "outputs": [], "source": [ "endpoint = {\"display_name\": \"custom_job_SKL\" + TIMESTAMP}\n", "\n", "print(\n", " MessageToJson(\n", " aip.CreateEndpointRequest(parent=PARENT, endpoint=endpoint).__dict__[\"_pb\"]\n", " )\n", ")" ] }, { "cell_type": "markdown", "metadata": { "id": "endpoints_create:migration,new,request" }, "source": [ "*Example output*:\n", "```\n", "{\n", " \"parent\": \"projects/migration-ucaip-training/locations/us-central1\",\n", " \"endpoint\": {\n", " \"displayName\": \"custom_job_SKL20210323185534\"\n", " }\n", "}\n", "```\n" ] }, { "cell_type": "markdown", "metadata": { "id": "call:migration" }, "source": [ "#### Call\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "endpoints_create:migration,new,call" }, "outputs": [], "source": [ "request = clients[\"endpoint\"].create_endpoint(parent=PARENT, endpoint=endpoint)" ] }, { "cell_type": "markdown", "metadata": { "id": "response:migration" }, "source": [ "#### Response\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "print:migration,new,response" }, "outputs": [], "source": [ "result = request.result()\n", "\n", "print(MessageToJson(result.__dict__[\"_pb\"]))" ] }, { "cell_type": "markdown", "metadata": { "id": "endpoints_create:migration,new,response" }, "source": [ "*Example output*:\n", "```\n", "{\n", " \"name\": \"projects/116273516712/locations/us-central1/endpoints/695823734614786048\"\n", "}\n", "```\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "endpoint_id:migration,new,response" }, "outputs": [], "source": [ "# The full unique ID for the endpoint\n", "endpoint_id = result.name\n", "# The short numeric ID for the endpoint\n", "endpoint_short_id = endpoint_id.split(\"/\")[-1]\n", "\n", "print(endpoint_id)" ] }, { "cell_type": "markdown", "metadata": { "id": "endpoints_deploymodel:migration,new" }, "source": [ "### [projects.locations.endpoints.deployModel](https://cloud.google.com/vertex-ai/docs/reference/rest/v1beta1/projects.locations.endpoints/deployModel)\n" ] }, { "cell_type": "markdown", "metadata": { "id": "request:migration" }, "source": [ "#### Request\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "endpoints_deploymodel:migration,new,request" }, "outputs": [], "source": [ "deployed_model = {\n", " \"model\": model_id,\n", " \"display_name\": \"custom_job_SKL\" + TIMESTAMP,\n", " \"dedicated_resources\": {\n", " \"min_replica_count\": 1,\n", " \"max_replica_count\": 1,\n", " \"machine_spec\": {\"machine_type\": \"n1-standard-4\", \"accelerator_count\": 0},\n", " },\n", "}\n", "\n", "print(\n", " MessageToJson(\n", " aip.DeployModelRequest(\n", " endpoint=endpoint_id,\n", " deployed_model=deployed_model,\n", " traffic_split={\"0\": 100},\n", " ).__dict__[\"_pb\"]\n", " )\n", ")" ] }, { "cell_type": "markdown", "metadata": { "id": "endpoints_deploymodel:migration,new,request" }, "source": [ "*Example output*:\n", "```\n", "{\n", " \"endpoint\": \"projects/116273516712/locations/us-central1/endpoints/695823734614786048\",\n", " \"deployedModel\": {\n", " \"model\": \"projects/116273516712/locations/us-central1/models/5984808915752189952\",\n", " \"displayName\": \"custom_job_SKL20210323185534\",\n", " \"dedicatedResources\": {\n", " \"machineSpec\": {\n", " \"machineType\": \"n1-standard-4\"\n", " },\n", " \"minReplicaCount\": 1,\n", " \"maxReplicaCount\": 1\n", " }\n", " },\n", " \"trafficSplit\": {\n", " \"0\": 100\n", " }\n", "}\n", "```\n" ] }, { "cell_type": "markdown", "metadata": { "id": "call:migration" }, "source": [ "#### Call\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "endpoints_deploymodel:migration,new,call" }, "outputs": [], "source": [ "request = clients[\"endpoint\"].deploy_model(\n", " endpoint=endpoint_id, deployed_model=deployed_model, traffic_split={\"0\": 100}\n", ")" ] }, { "cell_type": "markdown", "metadata": { "id": "response:migration" }, "source": [ "#### Response\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "print:migration,new,response" }, "outputs": [], "source": [ "result = request.result()\n", "\n", "print(MessageToJson(result.__dict__[\"_pb\"]))" ] }, { "cell_type": "markdown", "metadata": { "id": "endpoints_deploymodel:migration,new,response" }, "source": [ "*Example output*:\n", "```\n", "{\n", " \"deployedModel\": {\n", " \"id\": \"6653241616695820288\"\n", " }\n", "}\n", "```\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "deployed_model_id:migration,new,response" }, "outputs": [], "source": [ "# The unique ID for the deployed model\n", "deployed_model_id = result.deployed_model.id\n", "\n", "print(deployed_model_id)" ] }, { "cell_type": "markdown", "metadata": { "id": "endpoints_predict:migration,new" }, "source": [ "### [projects.locations.endpoints.predict](https://cloud.google.com/vertex-ai/docs/reference/rest/v1beta1/projects.locations.endpoints/predict)\n" ] }, { "cell_type": "markdown", "metadata": { "id": "0d0bbb13eea3" }, "source": [ "### Prepare file for online prediction" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "3b1abc212402" }, "outputs": [], "source": [ "INSTANCES = [\n", " [\n", " 25,\n", " \"Private\",\n", " 226802,\n", " \"11th\",\n", " 7,\n", " \"Never-married\",\n", " \"Machine-op-inspct\",\n", " \"Own-child\",\n", " \"Black\",\n", " \"Male\",\n", " 0,\n", " 0,\n", " 40,\n", " \"United-States\",\n", " ],\n", " [\n", " 38,\n", " \"Private\",\n", " 89814,\n", " \"HS-grad\",\n", " 9,\n", " \"Married-civ-spouse\",\n", " \"Farming-fishing\",\n", " \"Husband\",\n", " \"White\",\n", " \"Male\",\n", " 0,\n", " 0,\n", " 50,\n", " \"United-States\",\n", " ],\n", " [\n", " 28,\n", " \"Local-gov\",\n", " 336951,\n", " \"Assoc-acdm\",\n", " 12,\n", " \"Married-civ-spouse\",\n", " \"Protective-serv\",\n", " \"Husband\",\n", " \"White\",\n", " \"Male\",\n", " 0,\n", " 0,\n", " 40,\n", " \"United-States\",\n", " ],\n", " [\n", " 44,\n", " \"Private\",\n", " 160323,\n", " \"Some-college\",\n", " 10,\n", " \"Married-civ-spouse\",\n", " \"Machine-op-inspct\",\n", " \"Husband\",\n", " \"Black\",\n", " \"Male\",\n", " 7688,\n", " 0,\n", " 40,\n", " \"United-States\",\n", " ],\n", " [\n", " 18,\n", " \"?\",\n", " 103497,\n", " \"Some-college\",\n", " 10,\n", " \"Never-married\",\n", " \"?\",\n", " \"Own-child\",\n", " \"White\",\n", " \"Female\",\n", " 0,\n", " 0,\n", " 30,\n", " \"United-States\",\n", " ],\n", " [\n", " 34,\n", " \"Private\",\n", " 198693,\n", " \"10th\",\n", " 6,\n", " \"Never-married\",\n", " \"Other-service\",\n", " \"Not-in-family\",\n", " \"White\",\n", " \"Male\",\n", " 0,\n", " 0,\n", " 30,\n", " \"United-States\",\n", " ],\n", " [\n", " 29,\n", " \"?\",\n", " 227026,\n", " \"HS-grad\",\n", " 9,\n", " \"Never-married\",\n", " \"?\",\n", " \"Unmarried\",\n", " \"Black\",\n", " \"Male\",\n", " 0,\n", " 0,\n", " 40,\n", " \"United-States\",\n", " ],\n", " [\n", " 63,\n", " \"Self-emp-not-inc\",\n", " 104626,\n", " \"Prof-school\",\n", " 15,\n", " \"Married-civ-spouse\",\n", " \"Prof-specialty\",\n", " \"Husband\",\n", " \"White\",\n", " \"Male\",\n", " 3103,\n", " 0,\n", " 32,\n", " \"United-States\",\n", " ],\n", " [\n", " 24,\n", " \"Private\",\n", " 369667,\n", " \"Some-college\",\n", " 10,\n", " \"Never-married\",\n", " \"Other-service\",\n", " \"Unmarried\",\n", " \"White\",\n", " \"Female\",\n", " 0,\n", " 0,\n", " 40,\n", " \"United-States\",\n", " ],\n", " [\n", " 55,\n", " \"Private\",\n", " 104996,\n", " \"7th-8th\",\n", " 4,\n", " \"Married-civ-spouse\",\n", " \"Craft-repair\",\n", " \"Husband\",\n", " \"White\",\n", " \"Male\",\n", " 0,\n", " 0,\n", " 10,\n", " \"United-States\",\n", " ],\n", "]" ] }, { "cell_type": "markdown", "metadata": { "id": "request:migration" }, "source": [ "#### Request\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "endpoints_predict:migration,new,request,icn" }, "outputs": [], "source": [ "prediction_request = aip.PredictRequest(endpoint=endpoint_id)\n", "prediction_request.instances.append(INSTANCES)\n", "\n", "print(MessageToJson(prediction_request.__dict__[\"_pb\"]))" ] }, { "cell_type": "markdown", "metadata": { "id": "endpoints_deploymodel:migration,new,request" }, "source": [ "*Example output*:\n", "```\n", "{\n", " \"endpoint\": \"projects/116273516712/locations/us-central1/endpoints/695823734614786048\",\n", " \"instances\": [\n", " [\n", " [\n", " 25.0,\n", " \"Private\",\n", " 226802.0,\n", " \"11th\",\n", " 7.0,\n", " \"Never-married\",\n", " \"Machine-op-inspct\",\n", " \"Own-child\",\n", " \"Black\",\n", " \"Male\",\n", " 0.0,\n", " 0.0,\n", " 40.0,\n", " \"United-States\"\n", " ],\n", " [\n", " 38.0,\n", " \"Private\",\n", " 89814.0,\n", " \"HS-grad\",\n", " 9.0,\n", " \"Married-civ-spouse\",\n", " \"Farming-fishing\",\n", " \"Husband\",\n", " \"White\",\n", " \"Male\",\n", " 0.0,\n", " 0.0,\n", " 50.0,\n", " \"United-States\"\n", " ],\n", " [\n", " 28.0,\n", " \"Local-gov\",\n", " 336951.0,\n", " \"Assoc-acdm\",\n", " 12.0,\n", " \"Married-civ-spouse\",\n", " \"Protective-serv\",\n", " \"Husband\",\n", " \"White\",\n", " \"Male\",\n", " 0.0,\n", " 0.0,\n", " 40.0,\n", " \"United-States\"\n", " ],\n", " [\n", " 44.0,\n", " \"Private\",\n", " 160323.0,\n", " \"Some-college\",\n", " 10.0,\n", " \"Married-civ-spouse\",\n", " \"Machine-op-inspct\",\n", " \"Husband\",\n", " \"Black\",\n", " \"Male\",\n", " 7688.0,\n", " 0.0,\n", " 40.0,\n", " \"United-States\"\n", " ],\n", " [\n", " 18.0,\n", " \"?\",\n", " 103497.0,\n", " \"Some-college\",\n", " 10.0,\n", " \"Never-married\",\n", " \"?\",\n", " \"Own-child\",\n", " \"White\",\n", " \"Female\",\n", " 0.0,\n", " 0.0,\n", " 30.0,\n", " \"United-States\"\n", " ],\n", " [\n", " 34.0,\n", " \"Private\",\n", " 198693.0,\n", " \"10th\",\n", " 6.0,\n", " \"Never-married\",\n", " \"Other-service\",\n", " \"Not-in-family\",\n", " \"White\",\n", " \"Male\",\n", " 0.0,\n", " 0.0,\n", " 30.0,\n", " \"United-States\"\n", " ],\n", " [\n", " 29.0,\n", " \"?\",\n", " 227026.0,\n", " \"HS-grad\",\n", " 9.0,\n", " \"Never-married\",\n", " \"?\",\n", " \"Unmarried\",\n", " \"Black\",\n", " \"Male\",\n", " 0.0,\n", " 0.0,\n", " 40.0,\n", " \"United-States\"\n", " ],\n", " [\n", " 63.0,\n", " \"Self-emp-not-inc\",\n", " 104626.0,\n", " \"Prof-school\",\n", " 15.0,\n", " \"Married-civ-spouse\",\n", " \"Prof-specialty\",\n", " \"Husband\",\n", " \"White\",\n", " \"Male\",\n", " 3103.0,\n", " 0.0,\n", " 32.0,\n", " \"United-States\"\n", " ],\n", " [\n", " 24.0,\n", " \"Private\",\n", " 369667.0,\n", " \"Some-college\",\n", " 10.0,\n", " \"Never-married\",\n", " \"Other-service\",\n", " \"Unmarried\",\n", " \"White\",\n", " \"Female\",\n", " 0.0,\n", " 0.0,\n", " 40.0,\n", " \"United-States\"\n", " ],\n", " [\n", " 55.0,\n", " \"Private\",\n", " 104996.0,\n", " \"7th-8th\",\n", " 4.0,\n", " \"Married-civ-spouse\",\n", " \"Craft-repair\",\n", " \"Husband\",\n", " \"White\",\n", " \"Male\",\n", " 0.0,\n", " 0.0,\n", " 10.0,\n", " \"United-States\"\n", " ]\n", " ]\n", " ]\n", "}\n", "```\n" ] }, { "cell_type": "markdown", "metadata": { "id": "call:migration" }, "source": [ "#### Call\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "endpoints_predict:migration,new,call" }, "outputs": [], "source": [ "request = clients[\"prediction\"].predict(endpoint=endpoint_id, instances=INSTANCES)" ] }, { "cell_type": "markdown", "metadata": { "id": "response:migration" }, "source": [ "#### Response\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "print:migration,new,request" }, "outputs": [], "source": [ "print(MessageToJson(request.__dict__[\"_pb\"]))" ] }, { "cell_type": "markdown", "metadata": { "id": "endpoints_predict:migration,new,response,icn" }, "source": [ "*Example output*:\n", "```\n", "{\n", " \"predictions\": [\n", " false,\n", " false,\n", " false,\n", " true,\n", " false,\n", " false,\n", " false,\n", " false,\n", " false,\n", " false\n", " ],\n", " \"deployedModelId\": \"6653241616695820288\"\n", "}\n", "```\n" ] }, { "cell_type": "markdown", "metadata": { "id": "endpoints_undeploymodel:migration,new" }, "source": [ "### [projects.locations.endpoints.undeployModel](https://cloud.google.com/vertex-ai/docs/reference/rest/v1beta1/projects.locations.endpoints/undeployModel)\n" ] }, { "cell_type": "markdown", "metadata": { "id": "call:migration" }, "source": [ "#### Call\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "endpoints_undeploymodel:migration,new,call" }, "outputs": [], "source": [ "request = clients[\"endpoint\"].undeploy_model(\n", " endpoint=endpoint_id, deployed_model_id=deployed_model_id, traffic_split={}\n", ")" ] }, { "cell_type": "markdown", "metadata": { "id": "response:migration" }, "source": [ "#### Response\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "print:migration,new,response" }, "outputs": [], "source": [ "result = request.result()\n", "\n", "print(MessageToJson(result.__dict__[\"_pb\"]))" ] }, { "cell_type": "markdown", "metadata": { "id": "endpoints_undeploymodel:migration,new,response" }, "source": [ "*Example output*:\n", "```\n", "{}\n", "```\n" ] }, { "cell_type": "markdown", "metadata": { "id": "cleanup:migration,new" }, "source": [ "# Cleaning up\n", "\n", "To clean up all GCP resources used in this project, you can [delete the GCP\n", "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n", "\n", "Otherwise, you can delete the individual resources you created in this tutorial.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "cleanup:migration,new" }, "outputs": [], "source": [ "delete_model = True\n", "delete_endpoint = True\n", "delete_pipeline = True\n", "delete_batchjob = True\n", "delete_bucket = True\n", "\n", "# Delete the model using the Vertex AI fully qualified identifier for the model\n", "try:\n", " if delete_model:\n", " clients[\"model\"].delete_model(name=model_id)\n", "except Exception as e:\n", " print(e)\n", "\n", "# Delete the endpoint using the Vertex AI fully qualified identifier for the endpoint\n", "try:\n", " if delete_endpoint:\n", " clients[\"endpoint\"].delete_endpoint(name=endpoint_id)\n", "except Exception as e:\n", " print(e)\n", "\n", "# Delete the custom training using the Vertex AI fully qualified identifier for the custome training\n", "try:\n", " if custom_training_id:\n", " clients[\"job\"].delete_custom_job(name=custom_training_id)\n", "except Exception as e:\n", " print(e)\n", "\n", "# Delete the batch job using the Vertex AI fully qualified identifier for the batch job\n", "try:\n", " if delete_batchjob:\n", " clients[\"job\"].delete_batch_prediction_job(name=batch_job_id)\n", "except Exception as e:\n", " print(e)\n", "\n", "if delete_bucket and \"BUCKET_NAME\" in globals():\n", " ! gsutil rm -r gs://$BUCKET_NAME" ] } ], "metadata": { "colab": { "name": "UJ10 unified Custom Training Prebuilt Container SKLearn.ipynb", "toc_visible": true }, "kernelspec": { "display_name": "Python 3", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 0 }

notebooks/community/migration/UJ10 Custom Training Prebuilt Container SKLearn.ipynb (2,649 lines of code) (raw):